Pesquisa | Portal Regional da BVS

The complete sequence of a human genome.

Nurk, Sergey; Koren, Sergey; Rhie, Arang; Rautiainen, Mikko; Bzikadze, Andrey V; Mikheenko, Alla; Vollger, Mitchell R; Altemose, Nicolas; Uralsky, Lev; Gershman, Ariel; Aganezov, Sergey; Hoyt, Savannah J; Diekhans, Mark; Logsdon, Glennis A; Alonge, Michael; Antonarakis, Stylianos E; Borchers, Matthew; Bouffard, Gerard G; Brooks, Shelise Y; Caldas, Gina V; Chen, Nae-Chyun; Cheng, Haoyu; Chin, Chen-Shan; Chow, William; de Lima, Leonardo G; Dishuck, Philip C; Durbin, Richard; Dvorkina, Tatiana; Fiddes, Ian T; Formenti, Giulio; Fulton, Robert S; Fungtammasan, Arkarachai; Garrison, Erik; Grady, Patrick G S; Graves-Lindsay, Tina A; Hall, Ira M; Hansen, Nancy F; Hartley, Gabrielle A; Haukness, Marina; Howe, Kerstin; Hunkapiller, Michael W; Jain, Chirag; Jain, Miten; Jarvis, Erich D; Kerpedjiev, Peter; Kirsche, Melanie; Kolmogorov, Mikhail; Korlach, Jonas; Kremitzki, Milinn; Li, Heng.

Science ; 376(6588): 44-53, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35357919

RESUMO

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

Assuntos

Genoma Humano , Projeto Genoma Humano , Análise de Sequência de DNA/normas , Linhagem Celular , Cromossomos Artificiais Bacterianos/genética , Cromossomos Humanos/genética , Humanos , Valores de Referência

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

Vollger, Mitchell R; Logsdon, Glennis A; Audano, Peter A; Sulovari, Arvis; Porubsky, David; Peluso, Paul; Wenger, Aaron M; Concepcion, Gregory T; Kronenberg, Zev N; Munson, Katherine M; Baker, Carl; Sanders, Ashley D; Spierings, Diana C J; Lansdorp, Peter M; Surti, Urvashi; Hunkapiller, Michael W; Eichler, Evan E.

Ann Hum Genet ; 84(2): 125-140, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-31711268

RESUMO

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.

Assuntos

Biomarcadores/análise , Variação Genética , Genoma Humano , Haploidia , Mola Hidatiforme/genética , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Gravidez

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

Wenger, Aaron M; Peluso, Paul; Rowell, William J; Chang, Pi-Chuan; Hall, Richard J; Concepcion, Gregory T; Ebler, Jana; Fungtammasan, Arkarachai; Kolesnikov, Alexey; Olson, Nathan D; Töpfer, Armin; Alonge, Michael; Mahmoud, Medhat; Qian, Yufeng; Chin, Chen-Shan; Phillippy, Adam M; Schatz, Michael C; Myers, Gene; DePristo, Mark A; Ruan, Jue; Marschall, Tobias; Sedlazeck, Fritz J; Zook, Justin M; Li, Heng; Koren, Sergey; Carroll, Andrew; Rank, David R; Hunkapiller, Michael W.

Nat Biotechnol ; 37(10): 1155-1162, 2019 10.

Artigo em Inglês | MEDLINE | ID: mdl-31406327

RESUMO

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

Assuntos

DNA Circular/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Variação Genética , Haplótipos , Humanos

De novo assembly and phasing of a Korean human genome.

Seo, Jeong-Sun; Rhie, Arang; Kim, Junsoo; Lee, Sangjin; Sohn, Min-Hwan; Kim, Chang-Uk; Hastie, Alex; Cao, Han; Yun, Ji-Young; Kim, Jihye; Kuk, Junho; Park, Gun Hwa; Kim, Juhyeok; Ryu, Hanna; Kim, Jongbum; Roh, Mira; Baek, Jeonghun; Hunkapiller, Michael W; Korlach, Jonas; Shin, Jong-Yeon; Kim, Changhoon.

Nature ; 538(7624): 243-247, 2016 Oct 13.

Artigo em Inglês | MEDLINE | ID: mdl-27706134

RESUMO

Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.

Assuntos

Povo Asiático/genética , Mapeamento de Sequências Contíguas , Genoma Humano/genética , Genômica , Haplótipos/genética , Análise de Sequência de DNA , Alelos , Cromossomos Artificiais Bacterianos/genética , Citocromo P-450 CYP2D6/genética , Diploide , Variação Genética/genética , Antígenos de Histocompatibilidade Classe II/genética , Humanos , Medicina de Precisão , Padrões de Referência , República da Coreia

Resolving the complexity of the human genome using single-molecule sequencing.

Chaisson, Mark J P; Huddleston, John; Dennis, Megan Y; Sudmant, Peter H; Malig, Maika; Hormozdiari, Fereydoun; Antonacci, Francesca; Surti, Urvashi; Sandstrom, Richard; Boitano, Matthew; Landolin, Jane M; Stamatoyannopoulos, John A; Hunkapiller, Michael W; Korlach, Jonas; Eichler, Evan E.

Nature ; 517(7536): 608-11, 2015 Jan 29.

Artigo em Inglês | MEDLINE | ID: mdl-25383537

RESUMO

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.

Assuntos

Variação Genética/genética , Genoma Humano/genética , Genômica , Análise de Sequência de DNA/métodos , Inversão Cromossômica/genética , Cromossomos Humanos Par 10/genética , Clonagem Molecular , Sequência Rica em GC/genética , Haploidia , Humanos , Mutagênese Insercional/genética , Padrões de Referência , Sequências de Repetição em Tandem/genética

The linkage disequilibrium maps of three human chromosomes across four populations reflect their demographic history and a common underlying recombination pattern.

De La Vega, Francisco M; Isaac, Hadar; Collins, Andrew; Scafe, Charles R; Halldórsson, Bjarni V; Su, Xiaoping; Lippert, Ross A; Wang, Yu; Laig-Webster, Marion; Koehler, Ryan T; Ziegle, Janet S; Wogan, Lewis T; Stevens, Junko F; Leinen, Kyle M; Olson, Sheri J; Guegler, Karl J; You, Xiaoqing; Xu, Lily H; Hemken, Heinz G; Kalush, Francis; Itakura, Mitsuo; Zheng, Yi; de Thé, Guy; O'Brien, Stephen J; Clark, Andrew G; Istrail, Sorin; Hunkapiller, Michael W; Spier, Eugene G; Gilbert, Dennis A.

Genome Res ; 15(4): 454-62, 2005 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-15781572

RESUMO

The extent and patterns of linkage disequilibrium (LD) determine the feasibility of association studies to map genes that underlie complex traits. Here we present a comparison of the patterns of LD across four major human populations (African-American, Caucasian, Chinese, and Japanese) with a high-resolution single-nucleotide polymorphism (SNP) map covering almost the entire length of chromosomes 6, 21, and 22. We constructed metric LD maps formulated such that the units measure the extent of useful LD for association mapping. LD reaches almost twice as far in chromosome 6 as in chromosomes 21 or 22, in agreement with their differences in recombination rates. By all measures used, out-of-Africa populations showed over a third more LD than African-Americans, highlighting the role of the population's demography in shaping the patterns of LD. Despite those differences, the long-range contour of the LD maps is remarkably similar across the four populations, presumably reflecting common localization of recombination hot spots. Our results have practical implications for the rational design and selection of SNPs for disease association studies.

Assuntos

Mapeamento Cromossômico , Cromossomos Humanos Par 21 , Cromossomos Humanos Par 22 , Cromossomos Humanos Par 6 , Demografia , Desequilíbrio de Ligação , Recombinação Genética , Negro ou Afro-Americano/genética , Povo Asiático/genética , População Negra/genética , Genética Populacional , Humanos , Polimorfismo de Nucleotídeo Único , População Branca/genética

Whole-genome shotgun assembly and comparison of human genome assemblies.

Istrail, Sorin; Sutton, Granger G; Florea, Liliana; Halpern, Aaron L; Mobarry, Clark M; Lippert, Ross; Walenz, Brian; Shatkay, Hagit; Dew, Ian; Miller, Jason R; Flanigan, Michael J; Edwards, Nathan J; Bolanos, Randall; Fasulo, Daniel; Halldorsson, Bjarni V; Hannenhalli, Sridhar; Turner, Russell; Yooseph, Shibu; Lu, Fu; Nusskern, Deborah R; Shue, Bixiong Chris; Zheng, Xiangqun Holly; Zhong, Fei; Delcher, Arthur L; Huson, Daniel H; Kravitz, Saul A; Mouchard, Laurent; Reinert, Knut; Remington, Karin A; Clark, Andrew G; Waterman, Michael S; Eichler, Evan E; Adams, Mark D; Hunkapiller, Michael W; Myers, Eugene W; Venter, J Craig.

Proc Natl Acad Sci U S A ; 101(7): 1916-21, 2004 Feb 17.

Artigo em Inglês | MEDLINE | ID: mdl-14769938

RESUMO

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.

Assuntos

Biologia Computacional , Genoma Humano , Projeto Genoma Humano , Biologia Computacional/normas , Mapeamento de Sequências Contíguas/normas , Humanos , RNA Mensageiro/análise , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA